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1.  Introduction 


The  basic  problem  of  image  recovery  and 
pattern  recognition  is  to  determine  the  original 
pattern,  f,  given  its  corrupted  version,  g.  The 
unknown  pattern  f  is  an  element  of  the  set  S  = 
{f f2,  and  the  task  is  to  deduce  which 
pattern  in  S  gave  rise  to  the  image  data,  g.  S  is 
called  the  solution  candidate  space  and  could 
be,  for  example,  the  set  of  alphabetical  symbols. 

If  it  is  known  that  certain  elements  of  S  have 
a  higher  probability  of  occurring  than  others 
(such  as  alphabetical  symbols  in  text),  this  a  priori 
information  can  be  incorporated  into  the  proce¬ 
dure  for  finding  f  according  to  the  techniques  of 
Bayesian  analysis  (ll- 

In  the  general  image  recovery  problem,  S  is 
the  set  of  all  possible  patterns  on  an  n  x  n  pixel 
image,  and  the  relationship  between  the  origi¬ 
nal  image,  f,  and  the  image  data,  g,  can  be 
modeled  by 


g  =  f  +  w,  /  (1 p 

where  w  is  random  noise.  Sets  of  pixel  bright¬ 
nesses  at  lattice  position  (i,  j)  are  described  by  f 
=  {f[.},  g  =  {gf.j},  and  w  =  { w'}.  Since  w  is  a  random 
variable,  f  cannot  be  determined  from  g  using 
equation  (1).  The  best  w£  can  do  is  find  the 
pattern  f*  e  S  such  that  p(f  I  g)  is  a  maximum, 
where  p(f  I  g)  is  the  probability  that,  given  the 
image  data  g,f  was  the  original  image.  We 
cannot,  even  in  principle,  be  sure  that  f  *  =  f  or  is 
even  "close"  to  it.  If,  however,  f  is  known  "in 
advance"  (using  known  test  images  that  have 
been  corrupted  by  noise),  the  "goodness"  of  f* 
can  be  evaluated.  This  is  what  we  have  done. 

^fmage  recovery  and  pattern  recognition 
problems  are  thus  combinatorial  optimization 
problems,  in  which  a  solution  candidate  space, 
S,  must  be  searched.  The  larger  S  is,  the  more 
difficult  the  search.  The  number,  N,  of  images 
possible  on  an  n  x  n  pixel  array  where  each  pixel 
has  L  possible  grey  levels  is  given  by 

N  =  L"2  .  (2) 


Even  for  a  simple  10  x  10  binary  image 
where  L  =  2  and  n  =  10,  N  =  2100,  so  calculating 
p(f  I  g)  for  each  possible  f  is  computationally 
infeasible,  even  in  simple  cases.  A  problem 
whose  computational  difficulty  grows  expo¬ 
nentially  with  some  characteristic  problem  size 
measure  is  called  NP  (nonpolynomial)  com¬ 
plete  [2].  Most  NP  complete  problems  can  only 
be  solved  by  selectively  sampling  elements  of 
the  solution  candidate  space. 


2.  Bayesian  Analysis 

The  probability  of  "event"  f,  given  that  g 
has  occurred,  p(flg),  is  called  a  conditional 
probability.  The  probability  of  f  alone,  p(f ),  and 
that  of  g  alone,  p(g),  may  overlap  in  "probabil¬ 
ity  space"  as  illustrated  in  figure  1.  Obviously, 
p(f  I  g)  and  p(g  I  f)  are  given  by 


LT 


p(f'g)  = 


p(^g) 

p«g) 


p(g|f)  = 


p(**g) 

K9  ' 


(3) 


where  p(f  Ag)  is  the  probability  of  both  f  and  g. 
Eliminating  p(f  Ag)  from  (3)  yields 

W* fcgj®  •  <« 

Equation  (4)  expresses  Bayes'  theorem. 

The  probability  p(f)  expresses  our  a  priori 
(independent  of  the  image  data,  g)  knowledge 
of  f,  whereas  p(f  I  g)  represents  our  a  posterioi  i 
(given  the  image  data)  information  about  f. 
Thus  f*  e  S  which  maximizes  p(f  I  g)  is  the  best 
fit  to  both  the  a  priori  and  the  a  posteriori  knowl¬ 
edge  about  f.  Bayes'  theorem  allows  us  to  incor- 


Figure  1.  Probability  space  for  Bayes'  Theorem. 
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porate  the  "experimental  data,"  g,  with  our 
prior  knowledge  of  f.  This  approach  to  the 
image  restoration  and  pattern  recognition  prob¬ 
lem  is  often  called  the  maximum  a  posteriori 
(MAP)  technique. 


3.  Combinatorial  Optimiza¬ 
tion  Problem 

We  are  now  ready  to  obtain  an  analytic  ex¬ 
pression  for  p(f  I  g)  using  equation  (4). 

From  equation  (1 ),  p(g  I  f )  =  p(  w),  the  proba¬ 
bility  of  having  noise  w.  p(g  I  f)  can  be  repre¬ 
sented  in  the  form 

P(gl f)  =  ^r«N(t6)  ,  (5) 

where  HN( f,g)  is  the  noise  energy  and  N  a  nor¬ 
malization  term.  If,  for  example,  the  noise  is 
Gaussian,  HN( f,g)  =  1 1  f-g  1 1 2/(2o2),  where  1 1  f 
-  g  1 1  is  a  distance  measure  between  f  and  g 
(e.g.,  the  number  of  pixels  in  which  they  differ), 
and  a2  is  the  variance. 

Since  the  probability  of  obtaining  g  alone, 
p(g),  is  independent  of  f,  it  can  be  regarded  as  a 
constant  with  respect  to  f .  We  are  thus  left  with 
the  problem  of  determining  p(f). 

Images  tend  to  consist  of  large  regions  of 
constant  or  slowly  varying  brightness,  sepa¬ 
rated  by  edges.  Within  a  region,  a  pixel's  bright¬ 
ness  is  expected  to  be  similar  to  its  nearest 
neighbors,  so  a  pixel  whose  brightness  differs 
greatly  from  its  neighbors  has  "probably"  been 
affected  by  noise  (fig.  2).  An  image  can  thus  be 
modeled  as  a  set  of  pixels  whose  brightnesses 
depend  on  their  nearest  neighbors'  brightness 
probabilistically.  Such  a  system  of  interrelated 
probabilities  is  called  a  Markov  random  field 
[3]. 

It  can  be  shown  that  a  Markov  random  field 
is  formally  equivalent  to  an  Ising  [4,5]  system  of 
interacting  spins  on  a  lattice.  Analogous  to  the 
Ising  spin-spin  interaction,  a  pixel-pixel  inter¬ 
action  energy  can  be  defined.  The  sum  of  the 
pixel-pixel  interaction  energies  is  the  image 
energy,  HQ(f).  Following  the  statistical  mechani¬ 


Figure  2.  Markov  random  field  model  of  an 
image.  Brightness  of  center  pixel,  C,  is  related 
probabilistically  to  brightness  levels  of  its 
nearest  neighbors.  Usually,  3x3  neighborhoods 
are  used.  In  the  example  shown  above,  C  is 
lighter  than  its  surrounding  neighbors,  so  it  is 
probably  "really"  darker. 


cal  analogy,  the  probability  of  an  image  f  is 
determined  according  to  the  Boltzmann  distri¬ 
bution  by 

p(f,=d*(0 ,  (6> 

Zo 


where  Z0  is  a  normalization  term.  The  pixel- 
pixel  interaction  energy  thus  represents  our 
a  priori  knowledge  of  f .  Finding  H0(f),  given  f, 
logically  and  consistently  is  a  difficult  subject 
that  will  not  be  addressed  in  this  paper,  but  it  is 
not  difficult  to  construct  something  "reason¬ 
able." 

Equations  (5)  and  (6)  can  now  be  inserted 
into  equation  (4)  to  yield 


P(fl8)  = 


(7) 


where  H(f,g)  =  H0(f )  +  HN( f,g)  and  terms  that  are 
constant  with  respect  to  f  have  been  absorbed 
into  Z.  The  ratio  H0/HN  indicates  the  impor¬ 
tance  of  the  a  priori  information  about  f  relative 
to  the  data. 

The  problem  of  minimizing  p(f  I  g)  has  thus 
been  transformed  into  that  of  minimizing  H( f,g) 
over  the  space  of  all  possible  f  e  S.  Since  S  is 
very  large  we  cannot  possibly  examine  H( f,g) 
for  all  f.  Instead  we  use  a  stochastic  approach 
called  simulated  annealing. 
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4.  Simulated  Annealing 

In  the  physical  annealing  process,  a  system 
is  heated  to  a  high  temperature  and  then  slowly 
cooled,  allowing  the  system's  atoms  or  spins  to 
collectively  "search"  the  set  of  all  possible 
configurations  for  the  one  with  the  lowest  en¬ 
ergy  consistent  with  the  appropriate  constraints. 
Physical  annealing  thus  "performs"  combina¬ 
torial  optimization  over  a  huge  solution  candi¬ 
date  space. 

Simulated  annealing  mimics  the  essentials 
of  the  annealing  process  to  solve  the  combina¬ 
torial  optimization  problems.  Restating  the 
problem,  we  seek  f*  e  S  for  which  H(f,g)  is  a 
minimum.  The  essential  steps  of  the  simulated 
annealing  procedure  are  the  following: 

1.  Select  a  starting  "temperature,"  T,  (the 
meaning  of  temperature  will  become  clear  as 
we  proceed). 

2.  Randomly  select  from  S  a  solution 
candidate,  f2;  f,  is  the  current  solution  candi¬ 
date. 

3.  Calculate  E,  =  H(f,). 

4.  Choose  a  second  solution  candidate,  f2, 
"nearby"  the  current  solution  candidate  (e.g., 
change  one  pixel  of  the  image)  and  calculate  E2 
=  H(  f)2. 

5.  If  E2  <  Ej,  accept  f2  as  the  current  solution 
candidate.  If  E2  >  E5  ,  accept  f2  as  the  current 
solution  candidate  with  probabi'ity  p (AE,T1), 
where  A E  =  E2  -  Er  Return  to  step  3,  with  the 
current  solution  candidate  in  place  of  fr 

6.  After  a  "large"  number  of  cycles  through 
steps  3  to  5,  decrease  the  temperature,  and 
repeat  steps  3  to  5  with  T,  replaced  by  the  new 
temperature. 

7.  At  "sufficiently"  low  temperature  stop 
the  program.  The  current  candidate  solution  f' 
is  the  output  of  the  process. 

The  acceptance  probability,  p(AE,T),  is 
usually  taken  to  be 

P(A£-  T)  =  ,  (8) 

where  Z(T)  is  a  normalization  term.  Tempera¬ 
ture  measures  the  stochastic  noise  that  is  present 
in  the  algorithm,  much  as  it  does  in  a  physical 


system.  At  high  temperatures,  AE  >  0  transitions 
have  a  nonvanishing  probability,  allowing 
"jumps"  over  energy  barriers  between  local 
minima  and  the  global  one  (fig.  3).  As  T  —>  0, 
however,  p(AE,T)  ->  0,  for  AE  >  0,  so  that  only 
AE  <  0  transitions  occur  and  simulated  annealing 
becomes  a  gradient  descent  algorithm. 

If  cooling  is  too  rapid  (quenching)  the  sys¬ 
tem  may  "freeze"  in  a  local  instead  of  a  global 
energy  minimum.  It  has  been  shown  that  f',  the 
simulated  annealing  solution  at  the  nA  itera¬ 
tion,  converges  to  the  global  optimum,  f*,  as  n 
— » °°  for  a  logarithmic  cooling  schedule  in  which 
the  temperature,  T ,  at  the  n01  iteration  is  given 
by  Tn  =  f0/log(l  +  n),  where  T0  is  a  constant.  The 
closer  T  is  to  zero,  the  "better"  the  solution. 

n 

Unfortunately  the  temperature  drops  very 
slowly  (fig.  4),  and  there  is  no  general  error 
bound  on  the  simulated  annealing  solution  after 
a  given  finite  number  of  iterations.  There  have 
been  attempts  to  speed  up  the  algorithm  by 
using  acceptance  probabilities  other  than  the 
Boltzmann  distribution  [6]. 

Simulated  annealing  can  be  speeded  up  by 
parallel  execution  on  connection  machines  or 
specialized  neural  networks  called  Boltzmann 
machines.  Further  details  can  be  found  in  refer¬ 
ences  [7].  Alternative  optimization  procedures, 
such  as  mean  field  annealing  [8],  have  also  been 
proposed. 


Figure  3.  Goal  is  to  determine  global  minimum 
f*.  Gradient  descent  algorithms  "stick"  in  local 
minima  such  as  so  the  transition  f>— >  fb  cannot 
occur.  Simulated  annealing  is  a  stochastic 
procedure  that  sometimes  allows  transitions  to 
higher  energies  ( such  as  f.->  fb).  After  many 
iterations  the  simulated  annealing  solution  is 
close  to  the  global  optimum. 
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1.0 


Tn(n) 


0 

0  n  1000 

Figure  4.  Normalized  temperature,  Tn(tt),  after  n  simulated  annealing  iterations  under 
logarithmic  cooling  schedule.  Ttt(n)  =  TinVTil),  where  Tin)  is  the  temperature  at  the  nth 
iteration.  Tn(lOOO)  =  0.1,  so  temperature  decreases  by  only  a  factor  of  10  after  1000  cycles. 


5.  Image  Restoration 

We  used  41  x  41  binary  test  images  drawn 
from  a  set  of  character  fonts  supplied  with  the 
Turbo  C  programming  language.  The  charac¬ 
ters,  together  with  a  rectangular  background, 
were  converted  into  arrays  of  0's  and  l's.  We 
chose  the  characters  "A,"  "B,"  "E ,"  "G,"  and 
because  they  contain  a  variety  of  horizon¬ 
tal,  vertical,  diagonal,  and  curved  linesegments. 

Noise  was  added  by  sequentially  examin¬ 
ing  each  pixel  and  "flipping"  it  (changing  1  to  0 
or  vice  versa)  with  probability  p,  0  <  p  <  1.  The 
decision  whether  or  not  to  flip  was  made  by 
drawing  a  random  number,  x,  from  the  con¬ 
tinuous  range  0  <  x  <  1  for  each  pixel  and 
flipping  if  x  <  p.  We  quote  the  number  p  as  the 
noise  level  in  our  results.  Only  for  large  images 
does  it  represent  the  fraction  of  pixels  actually 
flipped. 

The  energy  change,  AE  =  H( f',g)  -  H(f,g), 
where  f  and  f'  differ  by  a  single  pixel  was 
defined  according  to  the  number  of  nearest 
neighbors  with  the  same  brightness  in  a  3  x  3 
neighborhood  on  a  square  grid  (table  1).  Edge 
pixels  were  regarded  as  neighbors  of  pixels  on 
the  opposite  side  of  the  image.  We  took  the 
normalization,  Z(T)  in  equation  (8)  to  be 

Z(T)  =  X  e^'  ,  (9) 

i 

where  the  sum  is  over  all  possible  values  of  A E. 


Table  1.  Energy  change  (AE)  of  central  pixel  (see 
fig.  1)  flipped  as  a  function  of  number  of  similar 
(same  brightness)  neighbor  pixels  (NJ 


N 

S 

AE 

8,7,6 

+2 

5 

+1 

4 

0 

3 

-1 

2,1,0 

-2 

We  performed  simulated  annealing  by  se¬ 
quentially  examining  each  pixel  and  deciding 
whether  or  not  to  flip  it  according  to  step  5  in 
section  4.  For  AE  >  0,  this  step  was  implemented 
by  choosing  a  random  number,  x,  in  the  range 
0  <  x  <  1 .  The  pixel  flip  was  accepted  in  the  case 
p(AE,T)  <  x.  It  might  have  been  closer  to  the 
"spirit"  of  simulated  annealing  to  examine  pixels 
randomly,  but  since  we  added  noise  anew  each 
time  we  ran  the  simulated  annealing  algorithm 
on  the  same  image,  not  much  was  lost. 

Instead  of  a  logarithmic  cooling  schedule, 
we  derived  one  (table  2)  by  inspecting  the 
behavior  of  p(AE,T)  with  temperature  (fig.  5).  If 
the  starting  temperature  is  too  high  the  simu¬ 
lated  annealing  procedure  may  add  more  noise 
than  it  removes,  whereas  if  it  is  too  low  we  are 
left  with  a  simple  gradient  descent  algorithm. 
We  used  a  total  of  only  10  annealing  cycles. 
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Table  2.  Simulated 
annealing  tempera¬ 
ture  schedule 

Temperature 

No.  of 
cycles 

100.0 

1 

50.0 

1 

20.0 

1 

5.0 

1 

1.2 

5 

0.6 

1 

0.3 

1 

0.0  T  1.0 


Figure  5.  Behavior  of  transition  acceptance  prob¬ 
ability,  p(A E,T),  with  temperature  for  different 
values  of  A E.  As  T  0,  p(A E,T)  ->  0;  thus  at  low 
temperature,  simulated  annealing  becomes  a 
gradient  descent 

Restoration  quality  was  studied  by  running 
the  program  repeatedly  on  the  sets  of  noisy 
images  with  "fresh"  noise  inserted  each  time. 
Restoration  quality  was  defined  as  the  fraction 
of  pixels  that  were  different  between  the  re¬ 
stored  and  the  original  image.  Our  procedure 
was  the  following:  (1)  choose  a  character  and 
corrupt  it  with  noise  level  p,  (2)  apply  the  simu¬ 
lated  annealing  algorithm,  and  (3)  measure  the 
quality  of  the  restoration.  The  procedure  was 
run  100  times  for  each  character  and  statistics 
were  compiled. 

6.  Results  and  Discussion 

Our  results  are  summarized  in  table  3,  and 
in  figure  6,  where  we  quote  the  mean  and  stan¬ 
dard  deviation  of  the  restoration  quality  for  100 


runs  on  each  character.  An  example  of  an  origi¬ 
nal  image  corrupted  with  noise  and  then  recov¬ 
ered  is  shown  in  figure  7. 

For  noise  levels  less  than  about  10  percent, 
our  simulated  annealing  procedure  does  not 
improve  image  quality,  and  in  fact  may  actu¬ 
ally  degrade  it.  This  is  not  surprising  because 
annealing  is  a  stochastic  process.  Ideally  the 
temperature  schedule  should  be  tailored  to  the 
noise  level  in  each  original  image,  but  we  used 
one  schedule  for  all  images.  Many  more  effi¬ 
cient  techniques  exist  for  extracting  patterns 
out  of  low  noise  backgrounds,  so  simulated 
annealing  would  be  of  little  use  in  this  regime 
anyway. 

As  the  data  show,  our  simulated  annealing 
procedure  did  yield  significant  improvements 
in  image  quality  in  the  20-  to  30-percent  noise 
range.  While  the  behavior  of  the  algorithm 
differed  from  character  to  character,  the  differ¬ 
ences  were  not  large. 

These  results  are  quite  promising  in  view  of 
the  fact  that  we  used  only  1 1  annealing  cycles 
with  an  ad  hoc  temperature  schedule.  For  future 
studies  we  plan  to  implement  a  logarithmic 
cooling  schedule  and  run  more  cycles.  We  also 
plan  to  examine  the  effect  of  different  accep¬ 
tance  probability  distributions,  p(AE,T),  such 
as  the  one  proposed  by  Szu  and  Hartley  [6]. 


Table  3.  Average  restoration  quality  and  stan¬ 
dard  deviation  for  100  simulated  annealing 
trials  per  character  (procedure  of  sect  33) 


Initial 

distortion 

0% 

10% 

20% 

30% 

Character 

A 

9  ±  1% 

9  ±  1% 

10  ±1% 

14  ±2% 

B 

10  ±1% 

10  ±1% 

11  ±1% 

15  ±2% 

C 

10  ±1% 

10  ±1% 

11  ±1% 

15  ±2% 

E 

10±1% 

10  ±1% 

11  ±1% 

14  ±2% 

& 

11  ±1% 

11  ±1% 

12  ±1% 

16  ±2% 
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Figure  6.  Noise 
remaining  after 

16.25 

simulated  anneal¬ 

15.00 

ing  as  a  function 
of  initial  noise 

13.75 

level  for  each 
symbol. 

12.50 

11.25 

10.00 

8.75 

Figure  7.  Image  cor¬ 
rupted  with  noise  and 
recovered:  (a)  original 
image,  (b)  with  30- 
percent  noise,  and 
(c)  after  10  simulated 
annealing  runs  11 
percent  of  the  pixels 
differ  from  the  original. 
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